Docking Analysis and Multidimensional Hybrid QSAR Model of 1,4-Benzodiazepine-2,5-Diones as HDM2 Antagonists

The inhibitors of p53-HDM2 interaction are attractive molecules for the treatment of wild-type p53 tumors. In order to search more potent HDM2 inhibitors, docking operation with CDOCKER protocol in Discovery Studio 2.1 (DS2.1) and multidimensional hybrid quantitative structure-activity relationship (QSAR) studies through the physiochemical properties obtained from DS2.1 and E-Dragon 1.0 as descriptors, have been performed on 59 1,4-benzodiazepine- 2,5-diones which have p53-HDM2 interaction inhibitory activities. The docking results indicate that π-π interaction between the imidazole group in HIS96 and the aryl ring at 4-N of 1,4-benzodiazepine-2,5-dione may be one of the key factors for the combination of ligands with HDM2. Two QSAR models were obtained using genetic function approximation (GFA) and genetic partial least squares (G/PLS) based on the descriptors obtained from DS2.1 and E-dragon 1.0, respectively. The best model can explain 85.5% of the variance (R 2adj ) while it could predict 81.7% of the variance (R 2 cv ). With this model, the bioactivities of some new compounds were predicted.


Introduction
The p53 is a multifunctional protein that regulates genes which can induce either cell cycle arrest or apoptosis (1-3). In human cells, p53 activity is normally modulated by the human double minute-2 (HDM2) protein, the homologue of murine double minute-2 (MDM2) protein (4).
HDM2 binds to and blocks the p53 transactivation domain, inhibiting its transcriptional activity. The dissociation of the binding between these two proteins is therefore an attractive therapeutic target for the treatment of wild-type p53 tumors (5). This action can be implemented through neutralizing antibodies (6), p53 peptides (7) or protein fusions (8,9), which leads to the control of proliferation. Additionally, antisense HDM2 oligonucleotides increase both p53 levels and activity by reducing HDM2 protein levels (10, R 2 cv hardware and computational theory, many molecule structural properties can be obtained through the computational process. Some software products provide these functions, such as Discovery studio (24), MOE (25), SYBYL (26) and E-Dragon (27), etc. For example, Discovery Studio 2.1 can calculate more than 100 kinds of molecular properties and more than 1600 molecular properties can be obtained from E-Dragon 1.0. These properties can be used expediently as descriptors to construct a QSAR model for drug screening. In 2009, Roys (28) calculated 2D and 3D properties of 116 diverse classes of aromatase inhibitors. With the assistance of genetic function approximation (GFA) and genetic partial least squares (G/PLS), QSAR models for non-steroid compounds were constructed using these 2D and 3D descriptors. The best (externally) predictive model was a GFA model with spline option using combined set (2D and 3D) descriptors and its predictive R 2 (R pred 2 ) reached 0.687. Dai et al. (29) used the computed molecular properties and docking scores (CDOCKER interaction energy) constructed a 2D-3D hybrid QSAR model for steroid aromatase inhibitors with the correlation coefficient of R 2 = 0.729 and Leave-one-out Cross-Validation of R 2 = 0.667. In these two studies, GFA was successfully employed to select the best combined set of descriptors from large number of physiochemical properties for the construction of high quality QSAR models. In order to construct a quantitative predictive model for the bioactivity of new 1,4-benzodiazepine-2,5-dione compounds as HDM2 antagonists, two QSAR models were built in this paper based on the computational properties of 59 1,4-benzodiazepine-2,5-dione compounds with known bioactivities. In addition, the molecular docking of BDPs to the HDM2 cleft was conducted using CDOCKER docking protocol in Discovery Studio 2.1(DS2.1) and the interaction of these compounds was analyzed with LigPlot (30).

Dataset and descriptors
The 1,4-benzodiazepine-2,5-dione chiral ligands with known inhibitory activities upon 11). Since small molecular antagonists of HDM2 are easily synthesized, some investigations on them have been reported, but many of these molecules have low potency and limited cellular activity (12)(13)(14). Most recently, some reports describing a series of 1,4-benzodiazepine-2,5dione compounds (BDPs) that act as the potent antagonists of the HDM2-p53 interaction (15). A library of BDPs was designed by some investigators using directed diversity method (16)(17)(18). These compounds were synthesized and screened with thermofluor screening technology (19) and further evaluated with a fluorescence polarization (FP) assay. In order to further understand the antagonizing mechanism of these compounds with HDM2-p53 interaction, Grasberger (11) et al. studied the crystal structure of BDPs with HDM2. The crystal structure confirmed that the BDPs occupy the p53 peptide binding site and they can mimic the α-helix of p53 peptide and may represent a promising scaffold to develop HDM2 antagonists. The three substituted aryl rings of the BDP overlay very closely with PHE19, TRP23 and LEU26 in the p53 binding pocket of HDM2 (20) and some structure-activity relationship (SAR) studies were deduced and explained (15). Recently, Wang et al. (21) conducted a benzodiazepinedione/peptide-based 3D-QSAR analysis using the comparative molecular field analysis (CoMFA) and comparative molecular similarity index analysis (CoMSIA), however, their molecular poses of ligands in the pocket of HDM2 from docking and crystal structure were only considered a little by taking a receptorguided consensus dynamics alignment since the poses of the alignments method are different from those of docking results. On the other hand, there is no explicit quantitative equation reported for the QSAR of BDPs as inhibitors of the p53-HDM2 although some quantitative structure-activity relation (QSAR) models have been reported for other HDM2 inhibitors (22,23). Whereas, BDPs are a kind of most potent HDM2 inhibitors, a high quality equation as QSAR model is necessary for finding new BDPs as potent HDM2 inhibitors. It is also important for new HDM2 inhibitors from other structural compounds.
Lately, with the development of computer HDM2-p53 interaction reported in the literature (4,15,31,32) were used as the model dataset for the present study (Table 1). The experimental protocol for the determination of inhibitory activities for all the compounds was the FP assay.
The inhibitory values in IC 50 (μM) were converted to the logarithmic scale pIC 50 (mM) before being used for the subsequent QSAR analyses as the response variable. Various physiochemical properties of the ligands obtained using the protocol of Calculate Molecular Properties in DS2.1 and E-Dragon 1.0 respectively, were selected as candidate descriptors used for QSAR construction. These physiochemical properties include 1D (Element counts, functional group counts) 2D (AlogP,LogD,Num_ StereoAtoms,Num_Rings,Molecular_Weight,Molecular_SurfaceArea,Num_H_Acceptors,Num_H_Donors,Num_RotatableBonds,2D autocorrelations,topological descriptors such as BIC,V_ADJ_equ,V_DIST_equ,CHI_3_P,CHI_V_3_P,CHI_1,CHI_3_C,CIC,IAC_Mean,IAC_Total,IC and SIC,etc.) and 3D (3D-NoRSE descriptors, Dipole, Jurs descriptors, shadow indices and Molecular_Volume, etc.) parameters. All the definition of these descriptors can be seen in the Help of DS2.1 and E-Dragon 1.0 software. For the calculation of 3D descriptors, multiple conformations of each molecule were generated using the Generate Conformations protocol with optimal search as a conformational search method. Each conformer was subjected to an energy minimization procedure using smart minimizer under the consistent force field (CFF) to generate the lowest energy conformation for each structure. The charges were calculated according to the Gasteiger method.

Docking process of 1,4-benzodiazepine-2,5diones to HDM2 protein
The crystal structure of HDM2 in complex with a benzodiazepine ligand ((S)-2-(4chlorophenyl)-2-((S)-3-(4-chlorophenyl)-7iodo-2,5-dioxo-2,3-dihydro-1H-benzo[e] [1,4] diazepin-4(5H)-yl) acetic acid, ligand 25 in Table 1, Figure 1) has been obtained from the RCSB protein data bank (http://www.pdb.org, ID:1T4E). The docking studies were conducted using CDOCKER of Receptor-ligand Interactions protocol in DS 2.1(33). The ligands and the HDM2 protein were all pretreated initially. For ligand preparation, the 3D structures of all the 1,4-benzodiazepine-2,5-diones were generated with ChemBioOffice 2008 (34) and optimized with AM1 method. Thereafter, the ligands were treated with Prepare Ligands protocol in DS 2.1. All the duplicate structures were removed and the options for ionization change, tautomer generation, isomer generation and 3D generator have been set true. For protein preparation, the hydrogen atoms were added with the pH in the range of 6.5-8.5 with DS2.1. HDM2 protein was defined as a total receptor and the site sphere was built with diameter of 10 Å based on the ligand 25. Then, the preexisting ligand 25 was removed. From the receptor-ligand interaction section of DS 2.1, CDOCKER was chosen and all the default operating parameters were used unless pre-declared. During the docking process, a freshly prepared ligand (compound from the dataset in Table 1) prepared by us was placed. CHARMm was selected as the force field. The molecular docking was performed with a simulated annealing method to minimize the CDOCKER energy (E CD ) for obtaining an optimum pose. This method was independently described by Scott Kirkpatrick et al. in 1983 (35) and by Vlado Cerny in 1985 (36). The method is a technique suitable for the optimization problems of large scale, especially ones where a desired global extremum is hidden among many, poorer, local extrema. The name comes from annealing in metallurgy, a procedure involving heating and controlled cooling of a material to increase the crystal orders and reduce their defects. The heat makes the atoms to become unstuck from their initial positions with a local minimum of internal energy and move randomly through states of higher energy. The slow cooling procedure gives them ample time for redistribution of the atoms to find configurations with lower internal energy than the initial one. Similar to this physical process, each step of the simulated annealing algorithm changes the current solution through a random "nearby" solution with a probability that relies on both the difference between the corresponding function values and a global parameter T (named as temperature), which is gradually decreased during the process. The changes of current solution depend on T and are organisms) are derived and tested repeatedly until an approximate optimal solution is found. In this study, 59 1,4-benzodiazepine-2,5-dione chiral compounds in Table 1 were selected as the training set. There are more than 120 2D and 3D physiochemical properties obtained from the protocol of Calculate Molecular Properties in DS 2.1 and 1621 physiochemical properties obtained from E-Dragon 1.0. But only a subset of these properties is statistically significant according to the correlation with the compounds activities. Some redundant parameters from the protocol of Calculate Molecular Properties in DS 2.1 were omitted using GFA protocol in DS 2.1 in order to save time and space and to decrease the complexity of the QSAR model (The reduction of 1D-3D physiochemical descriptors from E-Dragon 1.0 to build the QSAR model was conducted using the Material Studio 4.0 (39)).

Statistical quality evaluation and model validation
For a successful QSAR model, it should be robust enough to be capable of making accurate and reliable predictions of the biological activities, thus, the predictive capacity of the developed QSAR models from the training set should be validated. There are several methods to confirm the quality of QSAR model. In our experiments, Friedman lack-of-fit (LOF) was used for the selection of GFA-derived equations, while for G/PLS equations, the correlation coefficient R 2 and the adjusted R 2 ( 2 adj R ), were taken as objective functions to select an equation. Leaveone-out cross-validation R 2 ( 2 cv R ) was employed to validate the predictivity of generated QSAR model equations. The LOF [40] is designed to resist over-fitting, which is a problem often encountered in constructing statistical models. Since the number of descriptors available in HDM2 inhibitor QSAR analysis normally exceeds the number of observations (training set compounds), the ability to prevent over-fitting of GFA is critical to the successful construction of a statistically significant QSAR model. The smoothing factor was set to 0.5, which controls the model size, and GFA was used to optimize the QSAR models having different numbers of descriptor terms. For a given smoothing factor, the optimization of a QSAR model almost random when T is large, but increasingly "downhill" (for a minimization value) as T comes to zero. The chance for "uphill" moves potentially and prevents the method from becoming stuck at a local optimal value. In this study, the heating steps were set as 2000 with 700, heating target temperature. The cooling steps were set as 5000 with 300 as the cooling target temperature. Ten molecular docking poses saved for each ligand were ranked according to -CDOCKER energy (-E CD ). The pose with the highest -E CD was chosen as the most suitable pose for the subsequent pose analysis. After the end of molecular docking, the interactions of the docked receptor (HDM2) with ligand were analyzed with LigPlot.

QSAR model development
For a comparatively small quantity of samples with large available variables, the number of variables (descriptors) should be reduced far smaller than that of samples so that the obtained equation has the statistical meaning. But it is difficult to choose which property is more suitable as the descriptor to build QSAR models. Recently, this problem can be solved with genetic function approximation (GFA) technique. The principles about GFA can be seen elsewhere (37, 38). It involves the multivariate adaptive regression algorithm combining with the genetic algorithm (GA) to evolve the population of equations (each containing only a subset of variables) that best fit the training set data. With this method, a series of potential solutions to a problem (the population of was considered to be realized when descriptor usage became constant and independent of an increasing number of crossover operations. All the descriptors in the QSAR trial descriptor pool were used as linear terms during the GFA to generate QSAR models.

Analysis of the binding site
To understand the molecular details of HDM2 binding through the BDPs, the binding site view was made using the DS visualizer 2.5 ( Figure 2). The specific cleft to which the ligand binds (within 4 Å), contains both polar (GLY16, SER17, GLY58, ILE61, TYR67, GLN72, HIS73, HIS96, ILE99, TYR100) and nonpolar (LEU54, PHE55, MET62, VAL75, VAL93, PHE86, PHE97) amino acids and the inhibitor (contains phenyls A, B and C) occupies the same pockets as the peptide sidechains PHE 19, TRP 23, and LEU 26 of p53 ( Figure 2A) ( 20). The HDM2 interaction with the ligand 25 was analyzed with LigPlot and it was found that the nonspecific hydrophobic contacts are largely responsible for their interaction. The hydrophobic contacts of HIS 96, LEU 54, GLY 16 with the ring of phenyl C, ILE 99, Leu 57, GLY 58 with phenyl B, and ILE 61, MET 62 with phenyl A are shown in Figure 3A. There are also two hydrogen bonds between the carboxyl group of Ligand 25 and hydroxyl group of SER17, iodine of Ligand 25 and hydroxyl group of GLN 72 respectively. It can be concluded that the binding cleft of HDM2 is predominantly hydrophobic and largely nonspecific Van Der Waals contacts are responsible for the interaction between the ligand and HDM2 hydrophobic pocket (11).
The 1,4-benzodiazepine-2,5-dione compounds (BDPs) used to construct QSAR models are shown in Table 1. All the ligands listed in Table 1 are different from each other in R1 to R4, which lead to the different conformations adopted in the cleft of HDM2, therefore, the interactions with HDM2 are different and the bioactivities are varied.

Analysis of the optimum molecular docking poses of different ligands
The CDOCKER algorithm was applied to dock the ligands listed in Table 1. This docking technique needs a site sphere surrounding the ligand (the radius was set to 10 Å in this study) and this technique was tested with the ligand 25. Our docking pose of freshly prepared model of ligand 25 with CDOCKER also corroborates with that in crystal structure (RMSD = 0.425) indicating the reliability of this docking procedure ( Figure 2B). The specific residues surrounding 25 within 4Å (HIS96 is displayed in stick. π-π interaction is displayed in orange line and H-bond is in green dash line); (B) Comparison of Ligand 25 poses from X-ray crystal structure (red) and from CDOCKER docking (gray) and HIS96 is also shown in stick as location reference (The image was made with Chimera 1.3(41)). Each ligand in Table 1 was docked as described previously and the pose with the highest -E CD was considered as the optimum pose for each ligand. For the ligands in the top bioactivity range, such as ligands 25, 51 and 55 (IC 50 < 0.5 μM), there exists π-π interaction between the imidazole group in HIS96 and the aryl ring in R 3 (the occasion for ligand 25 is seen in Figure 2A). It can be considered as one of the key factors for the combination of the ligands with HDM2. Although the pocket of HDM2 is mainly hydrophobic, the additional hydrogen bonds can be formed. By introducing an amine functional group in the ortho position of aryl ring in R 3 , an additional hydrogen bond with VAL93 is formed for ligand 51 or 55, which is consistent with the previous prediction in literature ( Figure  3B) (33).The extra hydrogen bond formation increased the inhibitory potency of these ligands.
The R 1 changes in fatty acid, alkyl ether, alkyl (acyl) morpholine, alkyl (acyl) piperazine etc. (ligands 48-59). They are introduced in the ligand merely for improving the water-solubility of the ligands (33). From the docking position, we can see that these groups have nearly little contact with the acceptor protein ( Figure 4A shows the morpholine group in ligand 51 stretches out of the HDM2 cleft), and in turn, have little effect on enhancing the ligand affinity. On the other hand, they improve the solvent effect, which has the negative effect on the ligands' affinity to the target.
The chirality change at R 2 and α-C of R 3 also alters the bioactivity seriously, for example, ligands 49 and 50, 51 and 52, 53 and 54, are different from each other in R 2 or α-C of R 3 correspondingly ( Figure 4B shows the docking pose difference between the ligands 51 and 52). Instead of the aryl ring in R 3 of 51 (in violet), the aryl ring in R 2 formed the π-π interaction with the imidazole group of HIS96 in ligand 52 (grey). This conformation makes the extra hydrogen bond existed in ligands 51 which can't be formed for ligand 52 that may lower the affinity of the ligand with HDM2. Figure 4C shows the comparison of docking poses of I substituted (ligand 25) and Cl substituted (ligand 35) phenyl A ligands. From the comparison of the docking results, it was found that iodine group in phenyl A is suitable for the cleft space. When the iodine group of phenyl A was replaced by other halogen, cyanogen, amide, alkyl or alkyne groups, the ligand sizes could not match the HDM2 cleft as exactly as iodine, therefore, the interaction between the ligand and HDM2 was decreased and as a result, the HDM2 inhibitory activities were lowered.
The FP activity of ligands can be changed by substituted groups in aryl groups of R 2 and R 3 . The absence of a substituent in phenyl group resulted in a dramatic loss of potency as exemplified by ligand 28. Proper substituent size in the para-position of the phenyl group will make the ligand match the HDM2 cleft more suitably and the results show that the ethyl group (ligand 10), Cl, CF 3 and OCF 3 (ligands 12-14) are optimal for these substituents. All three aryl groups are in the cleft of HDM2 for these ligands. Substitution at either the ortho-   Table 3. Observed and predicted HDM2 inhibitory activities, physiochemical properties of different ligands from E-Dragon 1.0 used for the construction of QSAR models. or meta-positions of phenyl group in R 2 results in a sharp loss in activity due to the spatial clag generated for HDM2-liangd interaction (ligands 14 and 15). Figure 4D shows the docking poses comparison of ligand 12 with ligand 15. The phenyl group A in 4-10 is driven out of the cleft. To get a better understanding of the effect of 1,4-benzodiazepine-2,5-diones compounds 3D structure on their bioactivities, the docked positions with the minimized CDOCKER interaction energies (E CD ) of the 59 compounds are shown in Figure 5. Although the structural information can be obtained by viewing the ligand poses in the cleft of HDM2 and there are many SAR analyses everywhere (4, 15), these SARs cannot directly give the bioactivity values. The QSAR can give an explicit predicted value for a new compound and QSAR studies are vital to enable the prioritization of analogues resulting from iterative virtual screening and thus, the design of focused small molecule libraries around active ligands. Recently, there are some reports on QSAR study on MDM2 inhibitors (22, 42) and most recently Wang et al. (12) built some QSAR models using CoMFA and CoMSIA for BDPs and peptides as HDM2 inhibitors. However, there are no explicit quantitative equation model reported on 1,4-benzodiazepine-2,5-dione compounds though they are a kind of most potent entities as HMD2 inhibitors. A quantitative equation QSAR model of HMD2 inhibitors is more convenient to be used for the discovery of new antitumor drugs based on p53-HDM2 interaction. Therefore in this article, we used the protocol of Calculate Molecular Properties in DS 2.1 and E-Dragon 1.0 to calculate various physiochemical properties as the candidate descriptors for the construction of QSAR models of the substituted 1,4-benzodiazepine-2,5-dione compounds. Since the number of properties is larger than the kinds of obtained ligands, it should be reduced to a much small number than that of ligands in order to obtain an equation statistically. Due to the large number of properties, GFA method was employed to eliminate the properties with less relation to the bioactivity of ligand and keep the subset consisting of the most correlation to the potency of ligands. In this study, the physiochemical properties of BDPs from DS 2.1 were directly treated with the GFA protocol of this software, but the molecular descriptors from E-Dragon were treated with GFA in Material Studio 4.0. The linear term was used for the development of models with Friedman LOF smoothness parameter of 0.5 and the population size of 1000. The obtained QSAR model using linear term was then further treated with G/PLS and the model on the descriptors from DS 2.1 is as follows:  Ligands 23, 26-29 with R 1 substituents have the high LogD and the ligands 52, 54 and 55 with the higher Shadow_Xlength also have higher pIC 50 values. Num_StereoAtoms reflects that the fewer chiral atoms a ligand has, the higher the pIC 50 value it possesses (for example, ligand 1). The observed and predicted pIC 50 results and the values of physiochemical properties of the 59 ligands are listed in Table 2. The plot of the observed pIC 50 vs. the predicted data is shown in Figure 6. It can be seen that the predicted data by this model is basically in accordance with the experimental results. As a whole, it is only considered as a moderate QSAR model. In order to further improve the model quality, obtaining more descriptors is necessary. Thus, we collected 1620 kinds of molecular descriptors of BDPs using E-Dragon online tool. The QSAR model was obtained using GFA in MS 4.0. The QSAR model obtained is as follows:  Figure 7. The plot of the observed pIC 50 vs. the predicted data Figure 6. Plot of observed vs. predicted HDM2 inhibitory activities of different ligands in Table 1  The plot of the observed pIC 50 vs. the predicted data with Equation 2 is shown in Figure 7 and it illustrates that Equation 2 is a better model compared with Equation 1.

Prediction of some new HDM2 inhibitors
Based on the mentioned analysis, we designed some new compounds (Figure 8, I-III) and evaluated them with Equation 2. The predicted bioactivities are listed in Table 3. These compounds have higher inhibitory potency and their activities can be verified through chemosynthesis and FP assay lately. The docked pose of III using CDOCKER is shown in Figure  9.
From the structure of these coumpounds, we can see that the cyclopentane group gives the stability of the ligand conformation accormadating the cleft space. The extra aryl group at R 3 improved the hydrophobic interaction between HDM2 and the ligands. It is to be clarified that this model is based on the molecule-based (FP assay) potency. Considering the cellular activity of these compounds, the water-solubility should be improved through introducing some polar groups. From this point, compound III may be more applicable as potent HDM2 inhibitors.

Conclusions
In order to explore appropriate binding mode of different 1,4-benzodiazepine-2,5-dione compounds as HDM2 inhibitors, CDOCKER protocol in DS 2.1 was employed to conduct the docking process using a dataset of 59 1,4-benzodiazepine-2,5-dione compounds from literatures (4, 15, 31, 32) and the binding situation was analyzed. Our docking pose of freshly prepared model of ligand 25 corroborates to the crystal structure, which indicates the reliability of this docking procedure. The docking results indicate that hydrophobic interaction between the imidazole group in HIS96 and the aryl ring in R 3 may be one of the key factors for the combination of the ligands with HDM2. The binding cleft of HDM2 is predominantly hydrophobic and largely nonspecific Van Der Waals contacts are responsible for the interaction between the ligand and HDM2 hydrophobic pocket. Although the pocket of HDM2 is mainly hydrophobic, the hydrogen bonds can be formed with VAL93 residue in HDM2 for some ligands. The extra hydrogen bond formation can increase the inhibitory potency of the ligands to some extent. The chirality species of substituents in R 1 to R 4 influence the ligands' potency seriously. Various 1D-3D physiochemical properties such as structural and Jurs parameters, and also topological descriptors from DS2.1 and E-Dragon 1.0 were selected as candidate descriptors to build two QSAR models separately. The QSAR model Equation 1 (with eight descriptors) built using more than 100 descriptors from DS2.1 was obtained. It can explain 71.2% of the variance (R 2 adj ) while it could predict 67.2% of the variance (R 2 adj ). F > F (a = 0.05) = 2.13 shows that the model is in the confidence interval of 95%. In order to improve the QSAR model quality, more than 1600 descriptors obtained from E-Dragon 1.0 were used to build the QSAR model with 11 descriptors (Equation 2). It can explain 85.5% of the variance (R 2 adj ) while it could predict 81.7% of the variance (R 2 cv ). For both models, F > F (a = 0.05) = 2.13 shows that they are in the confidence interval of 95%. With these models, the bioactivities of some new compounds were predicted. The results show that the formation of cyclopentane ring at α-C or introduction of extra aryl group at original phenyl group of R 3 can improve the ligands' predicted pIC 50 values, which may improve the HDM2 inhibitory potency.